Bacterial genomes are generally smaller and less variant in size among species when compared with of . Bacterial genomes can range in size anywhere from about 130 kbp to over 14 Mbp. A study that included, but was not limited to, 478 bacterial genomes, concluded that as genome size increases, the number of genes increases at a disproportionately slower rate in eukaryotes than in non-eukaryotes. Thus, the proportion of non-coding DNA goes up with genome size more quickly in non-bacteria than in bacteria. This is consistent with the fact that most eukaryotic nuclear DNA is non-gene coding, while the majority of prokaryotic, viral, and organellar genes are coding. Right now, we have genome sequences from 50 different bacterial phyla and 11 different archaeal phyla. Second-generation sequencing has yielded many draft genomes (close to 90% of bacterial genomes in GenBank are currently not complete); third-generation sequencing might eventually yield a complete genome in a few hours. The genome sequences reveal much diversity in bacteria. Analysis of over 2000 Escherichia coli genomes reveals an E. coli core genome of about 3100 gene families and a total of about 89,000 different gene families. This article contains quotations from this source, which is available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. Genome sequences show that parasitic bacteria have 500–1200 genes, free-living bacteria have 1500–7500 genes, and archaea have 1500–2700 genes. A striking discovery by Cole et al. described massive amounts of gene decay when comparing Leprosy bacillus to ancestral bacteria. Studies have since shown that several bacteria have smaller genome sizes than their ancestors did. Over the years, researchers have proposed several theories to explain the general trend of bacterial genome decay and the relatively small size of bacterial genomes. Compelling evidence indicates that the apparent degradation of bacterial genomes is owed to a deletional bias.
The single gene comparison is now being supplanted by more general methods. These methods have resulted in novel perspectives on genetic relationships that previously have only been estimated.
A significant achievement in the second decade of bacterial genome sequencing was the production of metagenomic data, which covers all DNA present in a sample. Previously, there were only two metagenomic projects published.
Furthermore, amongst species of bacteria, there is relatively little variation in genome size when compared with the genome sizes of other major groups of life. Genome size is of little relevance when considering the number of functional genes in eukaryotic species. In bacteria, however, the strong correlation between the number of genes and the genome size makes the size of bacterial genomes an interesting topic for research and discussion.
The general trends of bacterial evolution indicate that bacteria started as free-living organisms. Evolutionary paths led some bacteria to become pathogens and . The lifestyles of bacteria play an integral role in their respective genome sizes. Free-living bacteria have the largest genomes out of the three types of bacteria; however, they have fewer pseudogenes than bacteria that have recently acquired .
and recently evolved pathogenic bacteria exhibit a smaller genome size than free-living bacteria, yet they have more pseudogenes than any other form of bacteria.
Obligate bacterial symbionts or pathogens have the smallest genomes and the fewest pseudogenes of the three groups. The relationship between life-styles of bacteria and genome size raises questions as to the mechanisms of bacterial genome evolution. Researchers have developed several theories to explain the patterns of genome size evolution amongst bacteria.
To extract information about bacterial genomes, core- and pan-genome sizes have been assessed for several strains of bacteria. In 2012, the number of core gene families was about 3000. However, by 2015, with an over tenfold increased in available genomes, the pan-genome has increased as well. There is roughly a positive correlation between the number of genomes added and the growth of the pan-genome. On the other hand, the core genome has remain static since 2012. Currently, the E. coli pan-genome is composed of about 90,000 gene families. About one-third of these exist only in a single genome. Many of these, however, are merely gene fragments and the result of calling errors. Still, there are probably over 60,000 unique gene families in E. coli.
Small genome size in such species is associated with certain particularities, such as rapid evolution of polypeptide sequences and low GC content in the genome. The convergent evolution of these qualities in unrelated bacteria suggests that an obligate association with a host promotes genome reduction.
Given that over 80% of almost all of the fully sequenced bacterial genomes consist of intact ORFs, and that gene length is nearly constant at ~1 kb per gene, it is inferred that small genomes have few metabolic capabilities. While free-living bacteria, such as E. coli, Salmonella species, or Bacillus species, usually have 1500 to 6000 proteins encoded in their DNA, obligately pathogenic bacteria often have as few as 500 to 1000 such proteins.
One candidate explanation is that reduced genomes maintain genes that are necessary for vital processes pertaining to cellular growth and DNA replication, in addition to those genes that are required to survive in the bacteria's ecological niche. However, sequence data contradicts this hypothesis. The set of universal orthologs amongst eubacteria comprises only 15% of each genome. Thus, each lineage has taken a different evolutionary path to reduced size. Because universal cellular processes require over 80 genes, variation in genes imply that the same functions can be achieved by exploitation of nonhomologous genes.
Host-dependent bacteria are able to secure many compounds required for metabolism from the host's cytoplasm or tissue. They can, in turn, discard their own biosynthetic pathways and associated genes. This removal explains many of the specific gene losses. For example, the Rickettsia species, which relies on specific energy substrate from its host, has lost many of its native energy metabolism genes. Similarly, most small genomes have lost their amino acid biosynthesis genes, as these are found in the host instead. One exception is the Buchnera, an obligate maternally transmitted symbiont of aphids. It retains 54 genes for biosynthesis of crucial amino acids, but no longer has pathways for those amino acids that the host can synthesize. Pathways for nucleotide biosynthesis are gone from many reduced genomes. Those anabolic pathways that evolved through niche adaptation remain in particular genomes.
The hypothesis that unused genes are eventually removed does not explain why many of the removed genes would indeed remain helpful in obligate pathogens. For example, many eliminated genes code for products that are involved in universal cellular processes, including replication, transcription, and translation. Even genes supporting DNA recombination and repair are deleted from every small genome. However some genes, such as those encoding the RecA protein, were found to be nearly ubiquitous, indicating that a large majority of bacterial genomes are probably capable of homologous recombination. In addition, small genomes have fewer , utilizing one for several amino acids. So, a single codon pairs with multiple codons, which likely yields less-than-optimal translation machinery. It is unknown why obligate intracellular pathogens would benefit by retaining fewer tRNAs and fewer DNA repair enzymes.
Another factor to consider is the change in population that corresponds to an evolution towards an obligately pathogenic life. Such a shift in lifestyle often results in a reduction in the genetic population size of a lineage, since there is a finite number of hosts to occupy. This genetic drift may result in fixation of mutations that inactivate otherwise beneficial genes, or otherwise may decrease the efficiency of gene products. Hence, not will only useless genes be lost (as mutations disrupt them once the bacteria has settled into host dependency), but also beneficial genes may be lost if genetic drift enforces ineffective purifying selection.
The number of universally maintained genes is small and inadequate for independent cellular growth and replication, so that small genome species must achieve such feats by means of varying genes. This is done partly through nonorthologous gene displacement. That is, the role of one gene is replaced by another gene that achieves the same function. Redundancy within the ancestral, larger genome is eliminated. The descendant small genome content depends on the content of chromosomal deletions that occur in the early stages of genome reduction.
The very small genome of M. genitalium possesses dispensable genes. In a study in which single genes of this organism were inactivated using transposon-mediated mutagenesis, at least 129 of its 484 ORGs were not required for growth. A much smaller genome than that of the M. genitalium is therefore feasible.
Free-living bacteria tend to have large population sizes and are subject to more opportunity for gene transfer. As such, selection can effectively operate on free-living bacteria to remove deleterious sequences resulting in a relatively small number of . Continually, further selective pressure is evident as free-living bacteria must produce all gene-products independent of a host. Given that there is sufficient opportunity for gene transfer to occur and there are selective pressures against even slightly deleterious deletions, it is intuitive that free-living bacteria should have the largest bacterial genomes of all bacteria types.
Recently formed parasites undergo severe bottlenecks and can rely on host environments to provide gene products. As such, in recently formed and facultative parasites, there is an accumulation of pseudogenes and transposable elements due to a lack of selective pressure against deletions. The population bottlenecks reduce gene transfer and as such, deletional bias ensures the reduction of genome size in parasitic bacteria.
Obligatory parasites and symbionts have the smallest genome sizes due to prolonged effects of deletional bias. Parasites which have evolved to occupy specific niches are not exposed to much selective pressure. As such, genetic drift dominates the evolution of niche-specific bacteria. Extended exposure to deletional bias ensures the removal of most superfluous sequences. Symbionts occur in drastically lower numbers and undergo the most severe bottlenecks of any bacterial type. There is almost no opportunity for gene transfer for endosymbiotic bacteria, and thus genome compaction can be extreme. One of the smallest bacterial genomes ever to be sequenced is that of the endosymbiont Carsonella rudii.
This is not to suggest that all bacterial genomes are reducing in size and complexity. While many types of bacteria have reduced in genome size from an ancestral state, there are still a huge number of bacteria that maintained or increased genome size over ancestral states. Free-living bacteria experience huge population sizes, fast generation times and a relatively high potential for gene transfer. While deletional bias tends to remove unnecessary sequences, selection can operate significantly amongst free-living bacteria resulting in evolution of new genes and processes.
Bacteria have more variation in their metabolic properties, cellular structures, and lifestyles than can be accounted for by point mutations alone. For example, none of the phenotypic traits that distinguish E. coli from Salmonella enterica can be attributed to point mutation. On the contrary, evidence suggests that horizontal gene transfer has bolstered the diversification and speciation of many bacteria.
Horizontal gene transfer is often detected via DNA sequence information. DNA segments obtained by this mechanism often reveal a narrow phylogenetic distribution between related species. Furthermore, these regions sometimes display an unexpected level of similarity to genes from taxa that are assumed to be quite divergent.
Although gene comparisons and phylogenetic studies are helpful in investigating horizontal gene transfer, the DNA sequences of genes are even more revelatory of their origin and ancestry within a genome. Bacterial species differ widely in overall GC content, although the genes in any one species' genome are roughly identical with respect to base composition, patterns of codon usage, and frequencies of di- and trinucleotides. As a result, sequences that are newly acquired through lateral transfer can be identified via their characteristics, which remains that of the donor. For example, many of the S. enterica genes that are not present in E. coli have base compositions that differ from the overall 52% GC content of the entire chromosome. Within this species, some lineages have more than a megabase of DNA that is not present in other lineages. The base compositions of these lineage-specific sequences imply that at least half of these sequences were captured through lateral transfer. Furthermore, the regions adjacent to horizontally obtained genes often have remnants of translocatable elements, transfer origins of , or known attachment sites of phage .
In some species, a large proportion of laterally transferred genes originate from plasmid-, phage-, or transposon-related sequences.
Although sequence-based methods reveal the prevalence of horizontal gene transfer in bacteria, the results tend to be underestimates of the magnitude of this mechanism, since sequences obtained from donors whose sequence characteristics are similar to those of the recipient will avoid detection.
Comparisons of completely sequenced genomes confirm that bacterial chromosomes are amalgams of ancestral and laterally acquired sequences. The hyperthermophilic Eubacteria Aquifex aeolicus and Thermotoga maritima each has many genes that are similar in protein sequence to homologues in thermophilic Archaea. 24% of Thermotoga's 1,877 ORFs and 16% of Aquifex's 1,512 ORFs show high matches to an Archaeal protein, while mesophiles such as E. coli and B. subtilis have far lesser proportions of genes that are most like Archaeal homologues.
Transformation involves the uptake of named DNA from the environment. Through transformation, DNA can be transmitted between distantly related organisms. Some bacterial species, such as Haemophilus influenzae and Neisseria gonorrhoeae, are continuously competent to accept DNA. Other species, such as Bacillus subtilis and Streptococcus pneumoniae, become competent when they enter a particular phase in their lifecycle.
Transformation in N. gonorrhoeae and H. influenzae is effective only if particular recognition sequences are found in the recipient genomes (5'-GCCGTCTGAA-3' and 5'-AAGTGCGGT-3'. respectively). Although the existence of certain uptake sequences improve transformation capability between related species, many of the inherently competent bacterial species, such as B. subtilis and S. pneumoniae, do not display sequence preference.
New genes may be introduced into bacteria by a bacteriophage that has replicated within a donor through generalized transduction or specialized transduction. The amount of DNA that can be transmitted in one event is constrained by the size of the phage capsid (although the upper limit is about 100 kilobases). While phages are numerous in the environment, the range of microorganisms that can be transduced depends on receptor recognition by the bacteriophage. Transduction does not require both donor and recipient cells to be present simultaneously in time nor space. Phage-encoded proteins both mediate the transfer of DNA into the recipient cytoplasm and assist integration of DNA into the chromosome.
Conjugation involves physical contact between donor and recipient cells and is able to mediate transfers of genes between domains, such as between bacteria and yeast. DNA is transmitted from donor to recipient either by self-transmissible or mobilizable plasmid. Conjugation may mediate the transfer of chromosomal sequences by plasmids that integrate into the chromosome.
Despite the multitude of mechanisms mediating gene transfer among bacteria, the process's success is not guaranteed unless the received sequence is stably maintained in the recipient. DNA integration can be sustained through one of many processes. One is persistence as an episome, another is homologous recombination, and still another is illegitimate incorporation through lucky double-strand break repair.
The capacity for natural transformation appears to be common among prokaryotes, and thus far 67 prokaryotic species (in seven different phyla) are known to undergo this process. Competence for transformation is typically induced by high cell density and/or nutritional limitation, conditions associated with the stationary phase of bacterial growth. Competence is also specifically induced by conditions that damage DNA. For example, transformation is induced in Streptococcus pneumoniae by the DNA damaging agents mitomycin C (a DNA cross-linking agent) and fluoroquinolone (a topoisomerase inhibitor that causes double-strand breaks). In Bacillus subtilis, transformation is stimulated by exposure to UV light, a DNA damaging agent. In Helicobacter pylori, ciprofloxacin, an agent that interacts with DNA gyrase and causes double-strand breaks, induces expression of competence genes, thus increasing the frequency of transformation Using Legionella pneumophila, Charpentier et al. examined 64 toxic molecules to find out which of these induce competence. Of these toxic compounds, only six, all DNA damaging agents, caused strong induction.
Bacteria that are growing logarithmically differ from stationary phase bacteria with regard to the number of genome copies present in the cell, and this has implications for the ability to carry out an important DNA repair process. During logarithmic growth, two or more copies of any particular region of the chromosome are ordinarily present in a bacterial cell, as cell division is not precisely matched with chromosome replication. Homologous recombinational repair is an important DNA repair process that is particularly effective for repairing double-strand damages, such as double-strand breaks. This DNA repair process depends on a second homologous chromosome in addition to the damaged chromosome. During logarithmic growth, a DNA damage in one chromosome may be removed by homologous recombinational repair using sequence information from the other homologous chromosome. However, when cells approach stationary phase they typically have just one copy of the chromosome, and homologous recombinational repair then requires input of an homologous template from outside the cell by transformation.
Adoption of a pathogenic lifestyle often yields a fundamental shift in an organism's ecological niche. The erratic phylogenetic distribution of pathogenic organisms implies that bacterial virulence is a consequence of the presence, or obtainment of, genes that are missing in avirulent forms. Evidence of this includes the discovery of large 'virulence' plasmids in pathogenic Shigella and Yersinia, as well as the ability to bestow pathogenic properties onto E. coli via experimental exposure to genes from other species.
|
|